基于支持向量机(SVM)的HCV NS3蛋白酶抑制剂分类模型

Classification Models of HCV NS3 Protease Inhibitors Based on Support Vector Machine (SVM)

Wang, M.L.; Xuan, S.Y.; Yan, A.X.*
Combinatorial Chemistry & High Throughput Screening, 2015, 18(1), 24-32.

    利用小分子抑制剂来抑制丙型肝炎病毒 (HCV) 非结构蛋白3 (NS3) 丝氨酸蛋白酶是治疗丙型肝炎的一种有前景的策略。我们基于包含413个HCV NS3蛋白酶抑制剂的数据集,使用支持向量机方法建立了四个分类模型。对测试集预测效果最好的模型,其预测准确率、敏感性(SE)、特异性(SP) 和Matthews相关系数 (MCC) 分别为90.76%、92.21%、88.10% 和0.799。旋转键数 (NRotBond)、电荷和电势相关性质被发现与抑制剂的生物活性有关。根据ECFP_4指纹图谱进行计算分析,发现含酰基磺酰胺基团的环丙基是活性抑制剂中唯一的亚结构。本文呈现的基于Kohonen自组织映射分割数据集和SVMAttributeEval选择描述符的方法可用于虚拟筛选来发现新型的HCV NS3蛋白酶抑制剂。

阅读文章原文

下载原始数据

Download Supporting Information

    Inhibition of the hepatitis C virus (HCV) non-structural protein 3 (NS3) serine protease by molecule inhibitors is an attractive strategy for the treatment of hepatitis C. We built four classification models based on a dataset of 413 HCV NS3 protease inhibitors using support vector machine method. The best performing model obtains the best prediction performance for the test set with prediction accuracy, sensitivity (SE), specificity (SP) and Matthews correlation coefficient (MCC) of 90.76%, 92.21%, 88.10% and 0.799, respectively. The number of rotatable bonds (NRotBond), charge and electronegativity related properties were found to be correlated with the bioactivity of the inhibitors. The ECFP_4 analyses of structural features were performed and it was found that the cyclopropyl with acylsulfonamide group was the unique substructure in the active inhibitors. The method with dataset split by Kohonen's self-organizing map and descriptors selected by SVMAttributeEval presented in this study can be employed in virtual screening for discovering novel inhibitors of HCV NS3 protease.

Read More

Classification Models performance:   Dataset (413 HCV NS3 protease inhibitors)

Model Name Algorithm Descriptors Training set accuracy (%) Test set SE Test set SP Test set accuracy (%) Test set MCC
Model 1A SVM 2 CORINA global, 13 CORINA 2D 81.97 0.88 0.52 75.63 0.443
Model 1B SVM 1 CORINA global, 12 CORINA 2D 89.80 0.95 0.62 83.19 0.624
Model 2A SVM 3 CORINA global, 11 CORINA 2D 83.67 0.88 0.57 77.31 0.485
Model 2B SVM 1 CORINA global, 19 CORINA 2D 91.50 0.92 0.88 90.76 0.799